Song Popularity with SpotifyAPI

An analysis of song popularity through Spotify-generated characteristics.

Hannah Ahn

Introduction

What makes a song… popular? Trying to evaluate that empirically is difficult - there are so many variables when it comes to creating a song; melody, rhythm/tempo, lyrics, structure, message, and dynamics are just a few that arise in my mind, and each variable has within it more variation. That is to say that, combinations of a multitude of things contribute to a song’s popularity. While I can’t answer the question “what makes a song popular?” within the scope of this project, I can evaluate the relationship between three musical features (danceability, valence, and speechiness) and popularity across different genres. For every track on their platform, Spotify provides data for thirteen “audio features.” I’ve chosen to focus on three; danceability, valence, and speechiness. Spotify defines them as follows:

Danceability: Describes how suitable a track is for dancing based on a combination of musical elements including tempo, rhythm stability, beat strength, and overall regularity.

Valence: Describes the musical positiveness conveyed by a track. Tracks with high valence sound more positive (e.g. happy, cheerful, euphoric), while tracks with low valence sound more negative (e.g. sad, depressed, angry).

Speechiness: This detects the presence of spoken words in a track. The more exclusively speech-like the recording (e.g. talk show, audio book, poetry), the closer to 1.0 the attribute value.

In this study I evaluate six different genres of music (pop, edm, rap, r&b, latin, and rock) with these three features, and hypothesize that each can influence a song’s popularity within each genres. Understanding these relationships is useful for not only the music industry but also music researchers, as it can inform musicians, producers, and playlist curators about the elements that contribute to a song’s popularity. This analysis contributes to the broader understanding of how musical preferences vary across genres and the impact of specific features on a song’s overall appeal.

Data

I initially tried to use SpotifyAPI through the spotifyr package directly, but kept getting various server errors. Instead, I decided to use [someone else’s database on kaggle], that was created using SpotifyAPI(https://www.kaggle.com/datasets/joebeachcapital/30000-spotify-songs/). The dataset includes almost 30,000 songs from Spotify of various genres and national origin. Spotify records and makes available the following:

track_id
track_name
track_artist
track_popularity
track_album_id
track_album_name
track_album_release_date
playlist_name
playlist_id playlist_genre

playlist_subgenre
danceability
energy
key loudness
mode
speechiness acousticness
instrumentalness
liveness
valence tempo
duration_ms

The independent variables in this study are the six genres (pop, rap, rock, r&b, latin, and edm) and three musical attributes (danceability, valence, and speechiness). These are percentages from 0 - 100%, recorded as decimals to three places, and are created by Spotify. While Spotify’s exact specifications for what constitutes each attribute is opaque, it seems as though they take into account tempo, dynamics, key, and other variables for danceability and valence, while speechiness is an evaluation of the ratio of words to music.

The dependent variable is song popularity, which is measured by Spotify. It is a popularity index that ranks how popular an artist is relative to other artists on Spotify, from 0 - 100. Increasing this score can increase a song’s algorithmic reach (popping up on people’s recommended and Spotify-generated playlists). According to Spotify, the popularity index is influenced by recent stream count, save rate, number of playlists, skip rate, and share rate.

This study is cross-sectionally designed; the data was collected at a single point in time using SpotifyAPI and examines the relationships between variables as they exist at a particular moment. Data is observational and does not fall under before-after or diff in diff study design.

My primary hypothesis posits that certain musical attributes can predict a track’s popularity within distinct genres; In this study I evaluate six different genres of music (pop, edm, rap, r&b, latin, and rock) with these three features. My primary hypothesis posits that certain musical attributes can predict a track’s popularity within distinct genres; I hypothesize that songs with higher danceability will be more popular in genres like pop, edm, latin, and rap, but be less popular in r&b and rock, and that the same relationship exists with valence. I also hypothesize that songs with higher speechiness will be less popular in genres like pop and edm, but more popular in the remaining genres.

I believe these hypotheses might be true because genres like edm, pop, latin, and rap often prioritize energetic and uplifting music that encourages movement. Higher danceability and valence align with the upbeat and lively nature of pop, edm, latin, and rap, making such songs more likely to be embraced by listeners seeking an energetic experience. R&b and rock genres often feature a broader range of emotional expression, including slower tempos and more complex arrangements. Higher danceability and valence may be perceived as less characteristic of the emotional depth and complexity associated with these genres, potentially making them less popular among audiences seeking a different musical experience.

With regards to speechiness, pop and edm genres often emphasize catchy melodies and instrumentals, with vocals playing a crucial role. Higher speechiness, indicating a greater presence of vocal elements, might be perceived as less favorable in genres where instrumental and electronic elements are more prominent. In rap, r&b, latin, and rock genres, lyrics and vocal expression often play a more central role. Higher speechiness may be associated with lyrical depth and storytelling, enhancing the appeal of songs in these genres. Additionally, genres like rap often rely on clear and impactful vocal delivery, making speechiness a potentially positive factor in popularity.

My hypotheses would be supported by the following: observing a positive relationship between danceability/valence and song popularity in genres pop, edm, latin, and rap, observing a negative relationship between danceability/valence and song popularity in genres r&b and rock, and observing a negative relationship between speechiness and song popularity in genres pop and edm. If I find the opposite to be true of these proposed relationships, those findings would provide evidence against my hypotheses.

The bar plot below illustrates the distribution of the dependent variable, popularity. As you can see, there are a fair amount of songs that have a low score - while it may seem like the data is skewed, 50 is considered a “good” score by Spotify. A score of 50 will generally get picked up by an algorithm and recommended to users. I would consider the distribution of popularity and sample of songs to be rather reasonable and not very biased.

# Creating a bar plot that shows distribution of popularity

popularity_bar <- ggplot(songs, aes(x = track_popularity)) +
  geom_bar() +
  labs(
    title = "Popularity",
    x = "Popularity Score",
    y = "Number of Songs"
  ) +
  theme_few()

popularity_bar

Results

The following plots visualize the relationships outlined in the Data section. They are interactive, so feel free to mouse over and inspect some of the data points if curious.

# Creating scatter plots

# Scatter plot with regression line for Danceability vs Track Popularity
danceability_plot <- songs |> 
ggplot(songs, mapping = aes(
  x = danceability, 
  y = track_popularity, 
  fill = playlist_genre)) +
  geom_point(alpha = 0.1, stroke = NA) +
  geom_smooth(method = "loess", se = FALSE, color = "black") +
  labs(
    title = "Danceability vs Popularity", 
    x = "Danceability", 
    y = "Popularity", 
    fill = "Playlist Genre") +
  facet_wrap(~ playlist_genre) +
  theme_few() 

# Scatter plot with regression line for Valence vs Track Popularity
valence_plot <- ggplot(songs, mapping = aes(
  x = valence, 
  y = track_popularity, 
  fill = playlist_genre)) +
  geom_point(alpha = 0.1, stroke = NA) +
  geom_smooth(method = "loess", se = FALSE, color = "black") +
  labs(
    title = "Valence vs Popularity", 
    x = "Valence", 
    y = "Popularity", 
    fill = "Playlist Genre")+ 
    facet_wrap(~ playlist_genre) +
  theme_few()

# Scatter plot with regression line for Speechiness vs Track Popularity
speechiness_plot <- ggplot(songs, mapping = aes(x = speechiness, y = track_popularity, fill = playlist_genre)) +
  geom_point(alpha = 0.1, stroke = NA) +
  geom_smooth(method = "loess", se = FALSE, color = "black") +
  labs(
    title = "Speechiness vs Popularity", 
    x = "Speechiness", 
    y = "Popularity", 
    fill = "Playlist Genre") + 
    facet_wrap(~ playlist_genre) +
  theme_few()

# Showing Plots
ggplotly(danceability_plot)
ggplotly(valence_plot)
ggplotly(speechiness_plot)

I’ve used the loess regression here to capture some of the nuances in the data more before we begin to conduct a linear regression analysis. From these visualizations alone, we can observe the following.

The Danceability vs Popularity graphs show us that there may be a positive correlation between danceability and popularity for pop, rap, r&b, and rock.

The Valence vs Popularity graphs show us that there may be a positive correlation between valence and popularity for edm, and perhaps an ever so slight positive correlation for pop and rock. It also looks as though there may be a slight negative correlation for latin and rap, and a negative correlation for r&b.

The Speechiness vs Popularity graphs show that, generally, songs are not very “speechy.” This makes sense, considering that a song that is mostly words and no music would not be much of a song. According to the graphs, there may be a slightly positive correlation between speechiness and popularity for edm, r&b, and rock.

Now, let us more rigorously evaluate the relationships illustrated above. Below is a large regression table that details the coefficients and intercepts, along with p-values.

# Running regressions for each genre

# Get list of unique genres and create empty list to store
genres <- unique(songs$playlist_genre)
regression_models <- list()

# Loop through each genre, filter for current genre within unique genre list, create and apply formula, store
for (genre in genres) {
  genre_songs <- filter(songs, playlist_genre == genre)
  formula <- as.formula(paste("track_popularity ~ valence + danceability + speechiness"))
  regression_model <- lm(formula, data = genre_songs)
  regression_models[[genre]] <- regression_model
}

varnames <- c("(Intercept)" = "Intercept", 
              "valence" = "Valence", 
              "danceability" = "Danceability", 
              "speechiness" = "Speechiness")

# Create summary table
summary_table <- msummary(regression_models,
                          statistic = c("s.e. = {std.error}",
                                         "p = {p.value}"),
                           gof_map = c("nobs", "r.squared", "adj.r.squared"),
                          coef_map = varnames)

summary_table
pop rap rock latin r&b edm
Intercept 34.174 28.850 35.547 40.531 43.749 34.671
s.e. = 1.737 s.e. = 1.692 s.e. = 1.513 s.e. = 2.246 s.e. = 1.740 s.e. = 1.609
p = <0.001 p = <0.001 p = <0.001 p = <0.001 p = <0.001 p = <0.001
Valence −1.475 −5.409 −3.958 −3.616 −17.933 14.685
s.e. = 1.644 s.e. = 1.414 s.e. = 1.777 s.e. = 1.684 s.e. = 1.676 s.e. = 1.413
p = 0.370 p = <0.001 p = 0.026 p = 0.032 p = <0.001 p = <0.001
Danceability 19.136 27.125 15.402 10.148 10.260 −8.805
s.e. = 2.832 s.e. = 2.294 s.e. = 2.929 s.e. = 3.266 s.e. = 2.754 s.e. = 2.589
p = <0.001 p = <0.001 p = <0.001 p = 0.002 p = <0.001 p = <0.001
Speechiness 28.106 −12.091 5.034 14.100 1.070 0.540
s.e. = 4.973 s.e. = 2.323 s.e. = 7.857 s.e. = 4.044 s.e. = 3.266 s.e. = 4.182
p = <0.001 p = <0.001 p = 0.522 p = <0.001 p = 0.743 p = 0.897
Num.Obs. 5507 5746 4951 5155 5431 6043
R2 0.015 0.029 0.006 0.005 0.021 0.018
R2 Adj. 0.015 0.029 0.005 0.004 0.020 0.017

This table might be slightly difficult to parse at a glance. I’ll briefly summarize by genre.

The pop genre had a negative valence coefficient (-1.475) that was not statistically significant (p = 0.370). However, its danceability coefficient was positive (19.136), and statistically significant (p < 0.001). The speechiness coefficient was positive (28.106) and statistically significant (p < 0.001).

The rap genre had a negative valence coefficient (-5.409) that was statistically significant (p < 0.001). Its danceability coefficient was positive (27.125) and statistically significant (p < 0.001). Its speechiness coefficient was negative (-12.091) and statistically significant (p < 0.001).

The rock genre had a negative valence coefficient (-3.958) that was statistically significant (p = 0.026). It had a positive danceability coefficient (15.402) that was statistically significant (p < 0.001). It had a positive speechiness coefficient (5.034) that was not statistically significant (p = 0.522).

The latin genre had a negative valence coefficient (-3.616) that was statistically significant (p = 0.032). Danceability coefficient was positive (10.148) and statistically significant (p = 0.002). The speechiness coefficient was also positive (14.100) and

The r&b genre exhibited a negative valence coefficient (-17.933) that was statistically significant (p < 0.001). It demonstrated a positive danceability coefficient (10.260) that was statistically significant (p < 0.001). The speechiness coefficient (1.070) was positive but not statistically significant (p = 0.743).

In the edm genre, the valence coefficient (14.685) was statistically significant and positive (p < 0.001). The danceability coefficient (-8.805) was negative and statistically significant (p < 0.001). The speechiness coefficient (0.540) was positive but not statistically significant (p = 0.897).

I’ve neglected speaking to the intercepts as I do not see them as incredibly relevant to the analysis - the intercepts are the supposed popularity of a song within the genre if valence, danceability, and speechiness were all 0.

Below is a regression that was run on the entire dataset, with no subsetting for genre.

# Overall regression for entire dataset
regressions <- lm(track_popularity ~ valence + danceability + speechiness, data = songs)

modelsummary::modelsummary(regressions,
                           statistic = c("s.e. = {std.error}",
                                         "p = {p.value}"),
                           gof_map = c("nobs", "r.squared", "adj.r.squared"),
                            coef_map = varnames)
 (1)
Intercept 34.974
s.e. = 0.645
p = <0.001
Valence 1.426
s.e. = 0.625
p = 0.023
Danceability 10.554
s.e. = 1.020
p = <0.001
Speechiness −1.277
s.e. = 1.381
p = 0.355
Num.Obs. 32833
R2 0.004
R2 Adj. 0.004

While many findings are statistically significant, I would hesitate to interpret them causally.

Conclusion

The regression aligns with the primary hypotheses, which posited that specific musical attributes could predict a track’s popularity within distinct genres.

With regards to my more specific hypotheses, we find mixed results. Generally, my findings support the hypotheses, revealing genre-specific patterns in the associations between these musical attributes and popularity. For instance, in genres like pop, EDM, Latin, and rap, higher danceability and valence are positively correlated with popularity, validating the hypothesized relationships. Conversely, in R&B and rock, these attributes exhibit a negative correlation with popularity, consistent with the initial hypotheses. The hypothesis regarding speechiness is also substantiated, as it is positively correlated with popularity in most genres, except pop and EDM where it shows a negative or non-significant association.

While causation cannot be definitively inferred, the observed patterns offer substantial evidence supporting the notion that distinct musical features contribute to the popularity of songs within specific genres.

Based on the regression results, several conclusions can be drawn for each music genre. We noted many statistically significant findings in the rgression table. For instance, in the r&b genre, a higher valence is associated with a significant decrease in popularity, while greater danceability and speechiness are positively correlated with popularity. However, it’s essential to note that correlation does not imply causation, and these relationships should be interpreted cautiously. The coefficient values provide insights into the direction and strength of the associations, with statistically significant coefficients indicating meaningful relationships. Overall, the findings provide partial support for the hypothesis that specific musical attributes predict popularity within distinct genres, as evidenced by the varied significance and directionality of coefficients across genres.

However, it’s crucial to acknowledge several limitations in the analysis. First, the models have relatively low R-squared values, suggesting that the included features explain only a small portion of the variability in popularity. Additionally, the assumption of linearity may not fully capture the complex relationships between musical features and popularity. Threats to inference include the possibility of confounding variables not accounted for in the models. Addressing missing data and exploring additional potential confounders could enhance the robustness of the analysis. If more time and resources were available, conducting more sophisticated analyses, such as considering nonlinear relationships or exploring interactions between features, could further refine the understanding of the factors influencing popularity in different music genres.

There appears to be relationships between valence, danceability, and speechiness in different genres.

Caveats and Limitations